Skip to content

feat(memory): host-managed memory lifecycle — two-lane retrieval + after-turn record (mem0 flow)#5326

Closed
BenKurrek wants to merge 1 commit into
nearai:reborn/memory-lift-followupsfrom
BenKurrek:reborn/memory-lifecycle
Closed

feat(memory): host-managed memory lifecycle — two-lane retrieval + after-turn record (mem0 flow)#5326
BenKurrek wants to merge 1 commit into
nearai:reborn/memory-lift-followupsfrom
BenKurrek:reborn/memory-lifecycle

Conversation

@BenKurrek

Copy link
Copy Markdown
Collaborator

Summary

Implements the mem0 host-managed memory flow on top of #5205 (reborn/memory-lift-followups), confined to the native memory provider + run-level orchestration. Memory now reaches the model proactively (the loop previously hardcoded memory_snippets: Vec::new()), and each completed turn is recorded back so the short-term lane is populated.

on_run_start:  long_term  = search(user scope)
               short_term = search(thread scope)
before_model:  prompt = system + long_term + short_term + conversation
after_turn:    record_interaction(messages=[user, assistant], run_id, metadata)

What changed

Retrieval (once per run)

  • Native retrieve_context scopes by invocation.scope.thread_id: Some(T) → only that thread's threads/<T>/ subtree (short-term); None → the user's general memory, excluding threads/* (long-term). Disjoint lanes → the host concatenates them (short-term first) under the existing 512 B/snippet + 4 KiB-aggregate admission.
  • ProductionMemoryPromptContextService::load_memory_snippets fetches both lanes once (long-term via ResourceScope::without_thread_and_mission()); ThreadBackedLoopContextPort caches the result in a per-run OnceCell and surfaces it into the "memory" prompt section on every model step.
  • Wired composition → loop_driver_host factory → port, mirroring user_profile_source. Graceful degradation everywhere — memory never breaks a turn.

Recording (after each turn) — "provider decides"

  • New low-level MemoryService::record_interaction(invocation, { messages, run_id, metadata }) — the mem0 add data shape (user_id/agent_id/thread_id ride the invocation scope). A default no-op trait impl lets each provider opt in: the host passes the DATA and the provider decides what to do with it (store verbatim, run LLM extraction, or nothing). Implements the reserved memory.interaction.record.v1 vocabulary.
  • The native provider stores the full turn history under threads/<thread_id>/log.md.
  • AfterTurnMemoryRecorder fires at turn_run_executor::apply_exit (gated on Completed), reads the exchange from the thread transcript with the owner-rewritten scope, and hands it down. Post-terminal, best-effort — every error is debug!-logged and never fails the already-completed run.

Testing

Unit + caller-level across ironclaw_memory{,_native}, host_runtime, loop_support, turns, reborn:

  • native two-lane scoping (short-term thread filter; long-term exclusion); host two-lane fetch + per-lane degradation + combined budget; prompt rendering from memory_snippets; once-per-run cache (load_loop_context fetch count stays 1); native record_interactionretrieve_context surfaces it; and a full-turn record through the executor (turn_runner_worker_records_after_turn_memory_on_completed_run).
  • cargo clippy clean on the touched crates; ironclaw_reborn_cli builds. (3 sandbox_process tests fail locally only without Docker; 1 pre-existing unused import: Path warning lives in untouched reborn_cli.)

Known items / follow-ups

  • Production graph wires None — reads + writes resolve on the local-dev runtime path only; the production graph degrades to no memory (issue Reborn: wire production-graph composition for optional context sources (identity + profile) #5013, same as user_profile_source).
  • Full-composition e2e (record in one run → surface in a later run's model request) not yet written; the individual links are unit + caller tested.
  • Budget tuning: the combined block is short-term-first, capped at max_snippets — a scratch-heavy thread can starve the long-term lane (per-lane sub-budgets are a later refinement).
  • Phase 3 (on_run_end durable summary) intentionally deferred.

Design doc: docs/superpowers/specs/2026-06-25-reborn-memory-host-lifecycle-design.md

🤖 Generated with Claude Code

…ter-turn record (mem0 flow)

Implements the mem0 host-managed memory flow on top of nearai#5205, with the surface
area confined to the native memory provider and run-level orchestration.

Retrieval (once per run):
- The loop fetches long-term (user-general) and short-term (per-thread,
  `threads/<thread_id>/`) memory once at the first prompt build of a run and
  injects both into the prompt's "memory" section, replacing the dead
  `memory_snippets: Vec::new()` in loop_support. A per-run OnceCell caches the
  fetch; subsequent model steps reuse it.
- Native `retrieve_context` scopes by `invocation.scope.thread_id`: Some(T) →
  only that thread's `threads/<T>/` subtree (short-term); None → the user's
  general memory, excluding `threads/*` (long-term). The lanes are disjoint, so
  the host concatenates them (short-term first) under the existing 4 KiB
  admission budget.
- Graceful degradation throughout: a memory failure degrades the lane to empty
  and never breaks a turn.

Recording (after each turn):
- New low-level `MemoryService::record_interaction(invocation, { messages,
  run_id, metadata })` — the mem0 `add` data shape (`user_id`/`agent_id`/
  `thread_id` ride the invocation scope). A default no-op trait impl lets each
  provider opt in: the host passes the DATA and the provider decides what to do
  with it (store verbatim, run LLM extraction, or nothing). Implements the
  reserved `memory.interaction.record.v1` vocabulary.
- The native provider stores the full turn history under `threads/<thread_id>/`.
- A host `AfterTurnMemoryRecorder` fires at the run-end seam
  (`turn_run_executor::apply_exit`, gated on `Completed`), reads the exchange
  from the thread transcript with the owner-rewritten scope, and hands it down.
  Post-terminal and best-effort: every error is `debug!`-logged and never fails
  the already-completed run.

Reads and writes resolve on the local-dev runtime path; the production graph
wires `None` (deferred, issue nearai#5013 — the same optionality as
`user_profile_source`).

Tested at unit + caller level across ironclaw_memory{,_native}, host_runtime,
loop_support, turns, and reborn (two-lane fetch, prompt rendering, once-per-run
cache, native record→retrieve, and a full-turn record through the executor). A
full-composition e2e (record in one run → surface in a later run's model
request) is called out as a follow-up.

Design: docs/superpowers/specs/2026-06-25-reborn-memory-host-lifecycle-design.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (2)
  • staging
  • reborn-integration

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: fb5e3b42-9994-4965-a391-7141b7603fe4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added size: XL 500+ changed lines risk: low Changes to docs, tests, or low-risk modules scope: docs Documentation scope: dependencies Dependency updates contributor: experienced 6-19 merged PRs labels Jun 26, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a host-managed memory lifecycle (mem0 flow) for the Reborn planned loop, introducing proactive memory retrieval across short-term and long-term lanes, as well as after-turn interaction recording. The feedback is highly constructive and identifies several key areas for optimization and robustness: concurrently fetching memory lanes to reduce latency, preserving distributed tracing correlation IDs, wrapping background interaction recording in a timeout to prevent blocking worker threads, filtering out empty messages before storage, and ensuring failed memory fetches are cached as empty to avoid repeated timeout penalties on subsequent iterations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +464 to +485
let cached = self
.memory_snippets_cache
.get_or_try_init(|| async {
let Some(request) = self.build_memory_prompt_context_request(context_messages)
else {
return Ok(Vec::new());
};
service.load_memory_snippets(request).await
})
.await;
match cached {
Ok(snippets) => snippets.clone(),
// A retrieval failure must never break the turn: degrade to empty.
// The cell stays uninitialized, so a later iteration may retry.
Err(error) => {
tracing::debug!(
kind = ?error.kind,
"memory context fetch failed; degrading to empty memory for this run"
);
Vec::new()
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using get_or_try_init leaves the OnceCell uninitialized if the memory service fails. Consequently, every subsequent iteration (model step) in the same run will attempt to fetch memory again, repeatedly hitting timeouts/failures and severely degrading performance. Using get_or_init and caching an empty vector on failure ensures we only attempt to fetch memory exactly once per run, preventing repeated latency penalties.

        let cached = self
            .memory_snippets_cache
            .get_or_init(|| async {
                let Some(request) = self.build_memory_prompt_context_request(context_messages)
                else {
                    return Vec::new();
                };
                match service.load_memory_snippets(request).await {
                    Ok(snippets) => snippets,
                    Err(error) => {
                        tracing::debug!(
                            kind = ?error.kind,
                            "memory context fetch failed; degrading to empty memory for this run"
                        );
                        Vec::new()
                    }
                }
            })
            .await;
        cached.clone()

Comment on lines +137 to +140
let long_term_invocation = MemoryInvocation {
scope: short_term_invocation.scope.without_thread_and_mission(),
correlation_id: CorrelationId::new(),
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Reusing the same correlation_id is important for distributed tracing and log correlation. Creating a new CorrelationId for the long-term lane breaks the correlation of the two lanes under the same request.

Suggested change
let long_term_invocation = MemoryInvocation {
scope: short_term_invocation.scope.without_thread_and_mission(),
correlation_id: CorrelationId::new(),
};
let long_term_invocation = MemoryInvocation {
scope: short_term_invocation.scope.without_thread_and_mission(),
correlation_id: short_term_invocation.correlation_id.clone(),
};

Comment on lines +142 to +160
let mut combined = self
.retrieve_lane(
short_term_invocation,
request.query.clone(),
request.max_snippets,
context_profile_id.clone(),
MemoryLane::ShortTerm,
)
.await
.map_err(map_memory_service_error)?;
.await;
combined.extend(
self.retrieve_lane(
long_term_invocation,
request.query,
request.max_snippets,
context_profile_id,
MemoryLane::LongTerm,
)
.await,
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The short-term and long-term lane retrievals are independent I/O-bound operations. Running them sequentially adds unnecessary latency to the prompt generation path. We can run them concurrently using tokio::join! to significantly reduce latency.

        let (mut combined, long_term) = tokio::join!(
            self.retrieve_lane(
                short_term_invocation,
                request.query.clone(),
                request.max_snippets,
                context_profile_id.clone(),
                MemoryLane::ShortTerm,
            ),
            self.retrieve_lane(
                long_term_invocation,
                request.query,
                request.max_snippets,
                context_profile_id,
                MemoryLane::LongTerm,
            )
        );
        combined.extend(long_term);

Comment on lines +118 to +133
let user_content = history
.messages
.iter()
.find(|message| {
message.kind == MessageKind::User && message.turn_run_id.as_deref() == Some(run_id)
})
.and_then(|message| message.content.as_deref())?;
let assistant_content = history
.messages
.iter()
.find(|message| {
message.kind == MessageKind::Assistant
&& message.status == MessageStatus::Finalized
&& message.turn_run_id.as_deref() == Some(run_id)
})
.and_then(|message| message.content.as_deref())?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the user or assistant message content is empty or contains only whitespace, recording it wastes storage and pollutes the memory index. Filtering out empty/whitespace-only content ensures only meaningful interactions are recorded.

    let user_content = history
        .messages
        .iter()
        .find(|message| {
            message.kind == MessageKind::User && message.turn_run_id.as_deref() == Some(run_id)
        })
        .and_then(|message| message.content.as_deref())
        .filter(|content| !content.trim().is_empty())?;
    let assistant_content = history
        .messages
        .iter()
        .find(|message| {
            message.kind == MessageKind::Assistant
                && message.status == MessageStatus::Finalized
                && message.turn_run_id.as_deref() == Some(run_id)
        })
        .and_then(|message| message.content.as_deref())
        .filter(|content| !content.trim().is_empty())?;
References
  1. When canonicalizing a string, perform cheaper validation checks (like length, emptiness, and character set) on a trimmed slice before performing more expensive operations like replace that allocate a new string.

Comment on lines +105 to +111
if let Err(error) = self
.memory_writer
.record_interaction(invocation, request)
.await
{
debug!(error = %error, "after-turn memory: record_interaction failed; run already complete");
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since record_interaction is a best-effort background task, awaiting it directly in the worker's execution path without a timeout can block the worker indefinitely if the memory provider hangs or is extremely slow. Wrapping the call in a timeout ensures the worker remains responsive.

        let record_result = tokio::time::timeout(
            std::time::Duration::from_secs(5),
            self.memory_writer.record_interaction(invocation, request),
        )
        .await;
        match record_result {
            Ok(Err(error)) => {
                debug!(error = %error, "after-turn memory: record_interaction failed; run already complete");
            }
            Err(_) => {
                debug!("after-turn memory: record_interaction timed out; run already complete");
            }
            _ => {}
        }

@BenKurrek

Copy link
Copy Markdown
Collaborator Author

Superseded by #5327 — reopened as a same-repo PR on nearai (this one was opened from the BenKurrek fork by mistake). Same branch/commit.

@BenKurrek BenKurrek closed this Jun 26, 2026
@BenKurrek BenKurrek deleted the reborn/memory-lifecycle branch June 26, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: experienced 6-19 merged PRs risk: low Changes to docs, tests, or low-risk modules scope: dependencies Dependency updates scope: docs Documentation size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant